Sequence Covering Similarity for Symbolic Sequence Comparison

نویسنده

  • Pierre-François Marteau
چکیده

This paper introduces the sequence covering similarity, that we formally define for evaluating the similarity between a symbolic sequence (string) and a set of symbolic sequences (strings). From this covering similarity we derive a pair-wise distance to compare two symbolic sequences. We show that this covering distance is a semimetric. Few examples are given to show how this string semimetric in O(n · logn) compares with the Levenshtein’s distance that is in O(n2). A final example presents its application to plagiarism detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence Covering for Efficient Host-Based Intrusion Detection

This paper introduces a new similarity measure, the covering similarity, that we formally define for evaluating the similarity between a symbolic sequence and a set of symbolic sequences. A pair-wise similarity can also be directly derived from the covering similarity to compare two symbolic sequences. An efficient implementation to compute the covering similarity is proposed that uses a suffix...

متن کامل

Similarity of symbolic sequences

A new numerical characterization of symbolic sequences is proposed. The partition of sequence based on Ke and Tong algorithm is a starting point. Algorithm decomposes original sequence into set of distinct subsequences a patterns. The set of subsequences common for two symbolic sequences (their intersection) is proposed as a measure of similarity between them. The new similarity measure works w...

متن کامل

A New Approach to Detect Congestive Heart Failure Using Symbolic Dynamics Analysis of Electrocardiogram Signal

The aim of this study is to show that the measures derived from Electrocardiogram (ECG) signals many a time perform better than the same measures obtained from heart rate (HR) signals. A comparison was made to investigate how far the nonlinear symbolic dynamics approach helps to characterize the nonlinear properties of ECG signals and HR signals, and thereby discriminate between normal and cong...

متن کامل

A New Approach to Detect Congestive Heart Failure Using Symbolic Dynamics Analysis of Electrocardiogram Signal

The aim of this study is to show that the measures derived from Electrocardiogram (ECG) signals many a time perform better than the same measures obtained from heart rate (HR) signals. A comparison was made to investigate how far the nonlinear symbolic dynamics approach helps to characterize the nonlinear properties of ECG signals and HR signals, and thereby discriminate between normal and cong...

متن کامل

Genetic variations of avian Pasteurella multocida as demonstrated by 16S-23S rRNA gene sequences comparison

Pasteurella multocida is known as an important heterogenic bacterial agent causes some severe diseases such as fowl cholera in poultry and haemorrhagic septicaemia in cattle and buffalo. A polymerase chain reaction (PCR) assay was developed using primers derived from conserved part of 16S-23S rRNA gene. The PCR amplified a fragment size of 0.7 kb using DNA from nine avian P. multocida  isolates...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.07013  شماره 

صفحات  -

تاریخ انتشار 2018